NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Retrieval and Structuring Augmented Generation with LLMs for Web Applications

https://doi.org/10.1145/3701716.3715870

Jiao, Yizhu; Ouyang, Siru; Zhong, Ming; Zhang, Yunyi; Ding, Linyi; Zhou, Sizhe; Han, Jiawei (May 2025, ACM)

Free, publicly-accessible full text available May 8, 2026
Multimodal Search in Chemical Documents and Reactions

https://doi.org/10.1145/3726302.3730152

Shah, Ayush Kumar; Dey, Abhisek; Luo, Leo; Amador, Bryan; Philippy, Patrick; Zhong, Ming; Ouyang, Siru; Friday, David Mark; Bianchi, David; Jackson, Nick; et al (July 2025, ACM)

Free, publicly-accessible full text available July 13, 2026
Ontology Enrichment for Effective Fine-grained Entity Typing

https://doi.org/10.1145/3637528.3671857

Ouyang, Siru; Huang, Jiaxin; Pillai, Pranav; Zhang, Yunyi; Zhang, Yu; Han, Jiawei (August 2024, ACM)
Baeza-Yates, Ricardo; Bonchi, Francesco (Ed.)
Fine-grained entity typing (FET) is the task of identifying specific entity types at a fine-grained level for entity mentions based on their contextual information. Conventional methods for FET require extensive human annotation, which is time-consuming and costly given the massive scale of data. Recent studies have been developing weakly supervised or zero-shot approaches.We study the setting of zero-shot FET where only an ontology is provided. However, most existing ontology structures lack rich supporting information and even contain ambiguous relations, making them ineffective in guiding FET. Recently developed language models, though promising in various few-shot and zero-shot NLP tasks, may face challenges in zero-shot FET due to their lack of interaction with task-specific ontology. In this study, we propose OnEFET, where we (1) enrich each node in the ontology structure with two categories of extra information: instance information for training sample augmentation and topic information to relate types with contexts, and (2) develop a coarse-to-fine typing algorithm that exploits the enriched information by training an entailment model with contrasting topics and instance-based augmented training samples. Our experiments show that OnEFET achieves high-quality fine-grained entity typing without human annotation, outperforming existing zero-shot methods by a large margin and rivaling supervised methods. OnEFET also enjoys strong transferability to unseen and finer-grained types. Code is available at https://github.com/ozyyshr/OnEFET.
more » « less
Full Text Available
Automated Mining of Structured Knowledge from Text in the Era of Large Language Models

https://doi.org/10.1145/3637528.3671469

Zhang, Yunyi; Zhong, Ming; Ouyang, Siru; Jiao, Yizhu; Zhou, Sizhe; Ding, Linyi; Han, Jiawei (August 2024, ACM)
Baeza-Yates, Ricardo; Bonchi, Francesco (Ed.)
Massive amount of unstructured text data are generated daily, ranging from news articles to scientific papers. How to mine structured knowledge from the text data remains a crucial research question. Recently, large language models (LLMs) have shed light on the text mining field with their superior text understanding and instructionfollowing ability. There are typically two ways of utilizing LLMs: fine-tune the LLMs with human-annotated training data, which is labor intensive and hard to scale; prompt the LLMs in a zero-shot or few-shot way, which cannot take advantage of the useful information in the massive text data. Therefore, it remains a challenge on automated mining of structured knowledge from massive text data in the era of large language models. In this tutorial, we cover the recent advancements in mining structured knowledge using language models with very weak supervision. We will introduce the following topics in this tutorial: (1) introduction to large language models, which serves as the foundation for recent text mining tasks, (2) ontology construction, which automatically enriches an ontology from a massive corpus, (3) weakly-supervised text classification in flat and hierarchical label space, (4) weakly-supervised information extraction, which extracts entity and relation structures.
more » « less
Full Text Available
ActionIE: Action Extraction from Scientific Literature with Programming Languages

https://doi.org/10.18653/v1/2024.acl-long.683

Zhong, Xianrui; Du, Yufeng; Ouyang, Siru; Zhong, Ming; Luo, Tingfeng; Ho, Qirong; Peng, Hao; Ji, Heng; Han, Jiawei (January 2024, Association for Computational Linguistics)

Full Text Available
Compositional Data Augmentation for Abstractive Conversation Summarization

https://doi.org/10.18653/v1/2023.acl-long.82

Ouyang, Siru; Chen, Jiaao; Han, Jiawei; Yang, Diyi (July 2023, Association for Computational Linguistics)

Recent abstractive conversation summarization systems generally rely on large-scale datasets with annotated summaries. However, collecting and annotating these conversations can be a time-consuming and labor-intensive task. To address this issue, in this work, we present a sub-structure level compositional data augmentation method, COMPO, for generating diverse and high-quality pairs of conversations and summaries. Specifically, COMPO first extracts conversation structures like topic splits and action triples as basic units. Then we organize these semantically meaningful conversation snippets compositionally to create new training instances. Additionally, we explore noise-tolerant settings in both self-training and joint-training paradigms to make the most of these augmented samples. Our experiments on benchmark datasets, SAMSum and DialogSum, show that COMPO substantially outperforms prior baseline methods by achieving a nearly 10% increase of ROUGE scores with limited data.
more » « less
ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision

https://doi.org/10.18653/v1/2023.findings-acl.767

Zhong, Ming; Ouyang, Siru; Jiang, Minhao; Hu, Vivian; Jiao, Yizhu; Wang, Xuan; Han, Jiawei (July 2023, Association for Computational Linguistics)

Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design. Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts. Consequently, the scarcity of sufficient training data poses an obstacle to the progress of related models in this domain. In this paper, we propose REACTIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions. Additionally, we adopt synthetic data from patent records as distant supervision to incorporate domain knowledge into the model. Experiments demonstrate that REACTIE achieves substantial improvements and outperforms all existing baselines.
more » « less
Full Text Available
Instruct and Extract: Instruction Tuning for On-Demand Information Extraction

https://doi.org/10.18653/v1/2023.emnlp-main.620

Jiao, Yizhu; Zhong, Ming; Li, Sha; Zhao, Ruining; Ouyang, Siru; Ji, Heng; Han, Jiawei (January 2023, Association for Computational Linguistics)

Full Text Available
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

https://doi.org/10.18653/v1/2023.emnlp-main.146

Ouyang, Siru; Wang, Shuohang; Liu, Yang; Zhong, Ming; Jiao, Yizhu; Iter, Dan; Pryzant, Reid; Zhu, Chenguang; Ji, Heng; Han, Jiawei (January 2023, Association for Computational Linguistics)

Full Text Available
Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data

https://doi.org/10.18653/v1/2023.emnlp-demo.36

Zhong, Ming; Ouyang, Siru; Jiao, Yizhu; Kargupta, Priyanka; Luo, Leo; Shen, Yanzhen; Zhou, Bobby; Zhong, Xianrui; Liu, Xuan; Li, Hongxiang; et al (January 2023, Association for Computational Linguistics)

Full Text Available

Search for: All records